Analysis of human papillomavirus type 16 E4, E5 and L2 gene variations among women with cervical infection in Xinjiang, China

Background There is a high incidence of cervical cancer in Xinjiang. Genetic variation in human papillomavirus may increase its ability to invade, spread, and escape host immune response. Methods HPV16 genome was sequenced for 90 positive samples of HPV16 infection. Sequences of the E4, E5 and L2 genes were analysed to reveal sequence variation of HPV16 in Xinjiang and the distribution of variation among the positive samples of HPV16 infection. Results Eighty-one of the 90 samples of HPV16 infection showed variation in HPV16 E4 gene with 18 nucleotide variation sites, of which 8 sites were synonymous variations and 11 missense variations. 90 samples of HPV16 infection showed variation in HPV16 E5 and L2 genes with 16 nucleotide variation sites (6 synonymous, 11 missense variations) in the E5 gene and 100 nucleotide variation sites in L2 gene (37 synonymous, 67 missense variations). The frequency of HPV16 L2 gene missense variations G3377A, G3599A, G3703A, and G3757A was higher in the case groups than in the control groups. Conclusions Phylogenetic tree analysis showed that 87 samples were European strains, 3 cases were Asian strains, there were no other variations, and G4181A was related to Asian strains. HPV16 L2 gene missense variations G3377A, G3599A, G3703A, and G3757A were significantly more frequent in the case groups than in the control groups. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-024-01926-3.


Introduction
Cervical cancer (CC) is the fourth leading cause of death from malignancy among women, with 604,000 new cases and 342,000 deaths worldwide in 2020 according to the World Health Organization (WHO) [1][2][3].The occurrence and development of cervical cancer is related to the economic and health status of the region, and the incidence and mortality rate of cervical cancer in Xinjiang, China, is still very high [4,5].
Cancer etiology research in the past 25 years has revealed persistent infection of high-risk human papillomavirus (HR-HPV) as a main cause of cervical cancer development and progression [6][7][8].The positive rate of HPV in Xinjiang is 14.02% with HPV52, HPV53, HPV16 and HPV18 being the most common types and HPV16 being the most pathogenic among all HPV types [9,10].HPV16 is the most dangerous and most preventable virus type.HPV16 is divided into four main variant lineages: A lineage contains EUR (A1-A3) and As (A4); B lineage contains AF-1, C lineage contains (AF-2) and D lineage contains NA (D1), AAII (D2) and AAI (D3).The HPV16 that infects women in Xinjiang, China, are mostly European strains in A lineage [11,12].
The complete HPV16 genome is approximately 7.9 kb, consisting of six early genes (E1, E2, E4, E5, E6, and E7), two late genes (L1 and L2), and one long control region (LCR) [13].The HPV16 E4 gene encodes the E4 protein, which is the protein with the highest HPV16 expression.The main amino acids are derived from the E4 ORF, which is contained in the E2 gene [14].The E4 protein plays a role in viral transmission by enhancing viral replication and the excretion of virions [15,16].The HPV16 E5 gene encodes a transmembrane protein with 83 amino acids that serves as an innate immune evasion factor.The transmembrane protein is involved in immune surveillance and immune evasion, leading to persistent viral infection.The transmembrane protein also plays a central role in the regulation of the host immune system, and directly related to the initial stages of cervical cancer development [17,18].The HPV L2 gene encodes a minor capsid protein, which promotes retrograde transport of the viral genome, integrating the viral genome into the host gene during the intercellular phase, which may lead to irreversible changes in the cell [19,20].
Because the HPV16 E4, E5 and L2 genes affect a series of processes of virus invasion, immune escape and transmission, the distribution of the HPV16 E4, E5 and L2 genes in Xinjiang is not yet clear.Therefore, in this study, we focused on the variation in the HPV16 E4, E5 and L2 genes and their distribution in the case group and control group.

Collection of samples
A total of 90 patient samples were collected from Yili Friendship Hospital, Kashgar District People's Hospital and Shihezi University Affiliated Hospital with HPV16positive cervical cell samples.All samples had been collected from 2016 to 2017 and patients of age were from 30 to 60 years old.The diagnosis of cervical cancer was confirmed by pathological examination according to "Diagnosis and Treatment, Obstetrics and Gynaecology" and the FIGO stage (International Federation of Gynaecology and Obstetrics, 2009).The inclusion criteria for control groups were HPV16 positive and the absence of lesions or inflammation in the cervix.Informed consent was obtained from all patients.All patients had no history of long-term travel or residence, and samples were collected and stored in a -80 ℃ low-temperature refrigerator.The pathological information of the samples is shown in Table S1 of the supplementary materials.

HPV genotyping
The HPV genotyping (23 types) was performed with PCR-reverse dot blot hybridization technology (Shenzhen Co., Ltd., China).All of the detection procedures were conducted in accordance with the manufacturer's instructions [12].

DNA extraction and PCR amplification of samples
A DNA extraction kit (Tiangen Biochemical Co., Ltd.) was used to extract DNA, which was stored in a -20 °C freezer.Using 1% agarose electrophoresis examined the quality of DNA samples; DNA samples were diluted to a working concentration of 10-20 ng/µL.Samples without DNA bands were re-extracted.The mixed reaction solution (40 µL) consisted of 20 µL of 2×Taq enzyme PCR SuperMix, 1 µL of forward primer (10µmol/L), 1 µL of reverse primer (10µmol/L), 2 µL of DNA sample, and 16 µL of ddH 2 O.The PCR cycling conditions were 94 °C for 5 min; 34 cycles of 94 °C for 30 s, 52 °C for 30 s, 72 °C for 1 min; and 72 °C for 5 min.Using 1% agarose electrophoresis examined the quality of PCR products, and samples with bright and regular bands at 650 bp were qualified for subsequent DNA sequencing.PCR products were stored in a -20 °C freezer.Information on the primers is shown in Table 1.

Sequencing
Sequencing was performed by Shanghai Sangon, The Beijing Genomics Institute and other sequencing companies, and the PCR product was purified by SAP (Promega) and EXO I (Epicentre): 0.5 U SAP and 4 U Exo I were added to 8 µl PCR products.The mixture was

Phylogenetic analysis of HPV16 variants
The raw sequences were assembled by Molecular Evolutionary Genetics Analysis (MEGA) software and were aligned with the European prototype virus strain (Gen-Bank: NC_001526.

Statistical analysis
The variation frequency of HPV16 E4, E5, and L2 gene variation was directly counted.SPSS 26.0 software was used to analyze the statistical results and the correlation between HPV16 E4, E5, and L2 single nucleotide variation and cervical cancer.P values < 0.05 is accepted as statistically significant in chi-square test.

Results
Sequence ).The number of variation of missense variation sites did not exceed 1.The frequency of missense variation was much lower than that of synonymous variations, indicating that the E4 gene was relatively conserved and had a stabilizing effect on the spread of the virus.

Table 2 HPV16 E4 gene variation and amino acid changes
AA is the Asian-American strain, As is the Asian strain, and Af is the African strain

Variation sites
AA As Af Number of variation samples(n = 81)

Synonymous variation
All of the 90 HPV16-positive samples had HPV16 E5 sequence variations in 16 sites with 6 synonymous and 11 missense variations.As shown in Table 3, the sites with high synonymous variation in the E5 gene were nt3213 (A-T) (20/90, 22.22%); the sites with more than 1 missense variation were: C2995T, A3115C, T3122C, C3127A/G, and A3178G, leading to amino acid changes leucine to phenylalanine (L4F), isoleucine to leucine (I44L), valine to alanine (V46A), leucine to isoleucine/ valine (L48I/V) and isoleucine to valine (I65V) respectively.The A3115C and A3178G variations had very high frequency, 75.56% and 100%, respectively, and the simultaneous occurance of these two variations may significantly change the structure of the E5 protein and indirectly change the ability of virus immune escape.

Phylogenetic tree analysis of the nucleotide sequences of HPV16 E4, E5 and L2
The Maximum likelihood method phylogenetic tree constructed with HPV16 E4, E5 and L2 gene sequences showed that 87 of the 90 HPV16 positive samples were European strains and 3 samples were Asian strains.No African, American or Asian-American strains were found.The Asian strains were associated with the missense variant G4181A, and the phylogenetic tree was shown in Fig. 1.

Genetic variation of genomic HPV16 E4, E5 and L2 in the case and control groups Genetic variation of genomic HPV16 E4 in the case and control groups
The pathological information of the samples was statistical, including 47 cases in the control groups and 43 cases in the case groups.There were 6 synonymous variations and 1 missense variations in the control groups (noncervical cancer group) (Table 5).In comparison, there were 7 synonymous variations and 10 missense variations in the case groups (cervical cancer group).The sequence

Synonymous variation A3459C
variations did not differ significantly between the control groups and the case groups (P > 0.05) (Table 5).Most of the missense variations appeared in the case group, indicating the trend of E4 gene missense variations in the case group.

Genetic variation of genomic HPV16 E5 in the case and control groups
The control groups (non-cervical cancer group) had 3 synonymous variations and 8 missense variations.In comparison, the case groups (cervical cancer group) had 4 synonymous variations and 8 missense variations (Table 6).The statistical results showed that the synonymous variations A3213T was significantly higher than the control groups in the case groups, and the difference was statistically significant (P = 0.024).It is worth mentioning that the variation frequency of the missense variation A3115C in the case groups was 83.72%, which was higher than the frequency of variation in the control groups (68.09%), but the difference was not statistically significant (P > 0.05).

Genetic variation of genomic HPV16 L2 in the case and control groups
There were 24 synonymous variations and 44 missense variations in the control groups and 22 synonymous variations and 44 missense variations in the case groups (Table 7), among which the missense variations were G3377A (P = 0.036), G3599A (P = 0.004), G3703A (P = 0.038), and G3757A (P = 0.019).The frequency of variations in the case groups was significantly higher than that in the control groups, and the difference was statistically significant (P < 0.05).The amino acid changes were arginine to glutamine (R2Q), alanine to glutamic acid (G76E), glutamate to lysine (E111K), and aspartate to asparagine (D129N).

33% I428L
AA is the Asian-American strain, As is the Asian strain, and Af is the African strain

Discussion
Most of the research on HPV gene variation has focused on the E6 and E7 genes, while there are fewer studies on E4, E5, and L2 gene variations, and the understanding of HPV16 E4, E5 and L2 gene variations in Xinjiang is even more insufficient.The HPV16 E4, E5 and L2 genes dominate virion propagation, immune surveillance and escape, and the integration of the viral genome into the chromosomes of the nucleus [15,18]; Sequence variation in E4, E5 and L2 genes, thus, will directly affect the ability of HPV virus to invade, immune, and spread.We found that 87 cases (87/90, 96.67%) of the 90 HPV16 positive samples in Xinjiang were European strains, and the other 3 cases (3/90, 3.33%) were Asian strains.We speculated that the Asian strain was directly related to the L2 gene missense variation G4181A through bioinformatics comparison.The literature reported that the amino acid variations of HPV16 E4 protein include T22A, P36T, A43K, Q53R, L62I and L62P, which of these amino acid variations are related to the severity of cervical malignancy [22].The E4 gene also encodes the E1^E4 protein, the first five amino acids of which are derived from E1 ORF, while the remaining amino acids are derived from E4 ORF.nine amino acid variations (A7V, A7P, L16I, D45E, L59I, L59T, Q66P, S72F, H75Q) were detected in the E1^E4 protein, and these were associated with the severity of cervical malignancy [23].In this study, it was found that the amino acid variation of E4 protein (E1^E4 protein) : C7W, A10S (A7S), Y13C (Y10C), K40T (K37T), R43I (R40I), Q51H (Q48H), Q53H (Q50H), E67D (E64D), L79V (L76V), H82Y (H79Y) and T83R (T80R), which of amino acid variations had low frequency.Therefore, since the amino acid variations the frequency of E4 protein were much lower than that of E5 and L2 proteins, indicating that E4 gene was more      The HPV16 L2 reference sequence is NC_001526.4a Represents sites with significant differences high frequency in L2 proteins, including D43E, S122P, V243I, T245A, L266F, L266V, S269P, L330F, D334N, T351P, T351S, T352P, T352A, S378VS378F, S384A, V385I, I420T, A424T, I428L and A443G.The amino acid variation I428L was present almost uniquely in Asia, and the frequency of S269P and L330F was higher than that of the reference amino acid, at position 330 phenylalanine of L2 protein was more common than the reference amino acid leucine in Europe, Asia, and North America [25].We also found high-frequency variations of S269P and L330F in the L2 protein, which were consistent with the previous reports.In addition, the other high frequency amino acid variations have also been found in L2 protein, including D84N, R90K, D96N, E111K and D129N (Table 4).We found that L2 missense variations G3377A (R2Q), G3599A (G76E), G3703A (E111K), and G3757A (D129N), were in significantly higher frequency in the case groups (cervical cancer) than in the control groups(non-cervical cancer) (P < 0.05).These variations may affect the integration of HPV16 viral genome into the cell chromosomes.
Xinjiang is a multiethnic region.We included a total of 52 Han ethnic group samples in the 90 samples we studied, including 31 samples in the control groups and 21 samples in the case groups, and recounted several loci with more variations (see Table 8 for details).We found that the missense variation A3115C of the E5 gene was in significantly higher frequency in the case groups than in the control groups (P = 0.02), and the difference was statistically significant.However, as seen in Table 6 above, A3115C did not differ between the case and control groups in the 90 samples, so the A3115C variation may have different effects on different ethnic groups, which needs further validation.
The current study revelaed for the first time sequence variations in HPV16 E4, E5 and L2 genes and in Xinjiang, and the distribution of these variations among different ethnic groups.The sample size of the current study is relatively small and should be increased in future studies, in particular to include samples from more ethnic groups.Based on findings from the current study, variations A3115C, G3377A, G3599A, G3703A and G3757A should be further investigated in cell experiments to determine whether they affect the viral immunity and the integration of viral genome in cell chromosomes.

Conclusion
Phylogenetic tree analysis showed that 87 samples were European strains, 3 cases were Asian strains, there were no other variants, and G4181A was related to Asian strains.HPV16 L2 gene missense variants G3377A, G3599A, G3703A, and G3757A were significantly more frequent in the case groups than in the control groups.

Fig. 1
Fig. 1 Phylogenetic tree analysis of HPV16 E4 E5 and L2 genes, the red dotted line is the reference virus strain

Table 1
Primer information F is the forward primer and R is the reverse primer

Table 3
HPV16 E5 gene variation and amino acid changesAA is the Asian-American strain, As is the Asian strain, and Af is the African strain

Table 4
HPV16 L2 gene variation and amino acid changes

Table 5
Genetic variation of genomic HPV16 E4 in case and control groups

Table 6
Genetic variation of genomic HPV16 E5 in case and control groups aRepresents sites with significant differences Variation

Table 7
Genetic variation of genomic HPV16 L2 in case and control groups

Table 8
Distribution of high variation frequency sites in Han nationalitya Represents sites with significant differences